Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Playback speech detection algorithm based on modified cepstrum feature

LIN Lang, WANG Rangding, YAN Diqun, LI Can

Journal of Computer Applications 2018, 38 (6): 1648-1652. DOI: 10.11772/j.issn.1001-9081.2017112822

Abstract （518）

PDF （932KB）（297）

Save

With the development of speech technology, various kinds of phishing speech represented by playback speech have brought serious challenge for voiceprint authentication system and audio forensics technology. Aiming at the attack problem of playback speech to voiceprint authentication system, a new detection algorithm based on modified cepstrum feature was proposed. Firstly, the coefficient of variation was used to analyze the difference between the original speech and the playback speech in the frequency domain. Secondly, a new filter bank composed of inverse-Mel filters and linear filters was used to replace Mel filter bank in the process of extracting Mel Frequency Cepstral Coefficients (MFCC) pertinently, and then the modified cepstrum feature based on the new filter bank was obtained. Finally, Gaussian Mixture Model (GMM) was utilized as the classifier to classify and discriminate speech. The experimental results show that, the modified cepstrum feature can effectively detect the playback speech, and its equal error rate is about 3.45%.

Reference | Related Articles | Metrics

Select

Recaptured speech detection algorithm based on convolutional neural network

LI Can, WANG Rangding, YAN Diqun

Journal of Computer Applications 2018, 38 (1): 79-83. DOI: 10.11772/j.issn.1001-9081.2017071896

Abstract （531）

PDF （838KB）（379）

Save

Aiming at the problems that recaptured speech attack to speaker recognition system harms the rights and interests of legitimate users, a recaptured speech detection algorithm based on Convolutional Neural Network (CNN) was proposed. Firstly, the spectrograms of the original speech and the recaptured speech were extracted and input into the CNN for feature extraction and classification. Secondly, for the detection task, a new network architecture was constructed, and the effect of the spectrograms with different window shifts were discussed. Finally, the cross-over experiments for various recapture and replay devices were constructed. The experimental results demonstrate that the proposed method can accurately discriminate whether the detected speech is recaptured or not, and the recognition rate achieves 99.26%. Compared with the mute segment Mel-Frequency Cepstral Coefficient (MFCC) algorithm, channel mode noise algorithm and long window scale factor algorithm, the recognition rate is increased by about 26 percentage points, about 21 percentage points and about 0.35 percentage points respectively.

Reference | Related Articles | Metrics